DiscoverHuggingFace 每日AI论文速递2025.10.23 | 线性注意力显存降十倍;动态裁剪PPO稳提分
2025.10.23 | 线性注意力显存降十倍;动态裁剪PPO稳提分

2025.10.23 | 线性注意力显存降十倍;动态裁剪PPO稳提分

Update: 2025-10-23
Share

Description

本期的 15 篇论文如下:

[00:19 ] 🧠 Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning(每一种注意力都重要:面向长上下文推理的高效混合架构)

[00:59 ] ⚖ BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping(BAPO:通过自适应裁剪的平衡策略优化稳定LLM离策略强化学习)

[01:40 ] 🧠 LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts(LoongRL:面向长文本高级推理的强化学习方法)

[02:18 ] 🌍 GigaBrain-0: A World Model-Powered Vision-Language-Action Model(GigaBrain-0:基于世界模型的通才视觉-语言-动作大模型)

[02:49 ] 🔄 Language Models are Injective and Hence Invertible(语言模型是单射的,因此可逆)

[03:25 ] 📹 VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos(VideoAgentTrek:利用无标注视频预训练计算机操作智能体)

[04:01 ] 📲 DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents(DaMo:面向手机智能体的多模态大模型微调数据配比优化器)

[04:55 ] 🚀 Unified Reinforcement and Imitation Learning for Vision-Language Models(统一强化与模仿学习的视觉-语言模型)

[05:28 ] 🖼 Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing(Pico-Banana-400K:面向文本引导图像编辑的大规模高质量数据集)

[06:17 ] 📊 FinSight: Towards Real-World Financial Deep Research(FinSight:迈向真实场景的金融深度研究)

[07:06 ] 🧠 Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues(他们是恋人还是朋友?评估大语言模型在英韩对话中的社会推理能力)

[07:43 ] 🌍 OmniNWM: Omniscient Driving Navigation World Models(OmniNWM:全景驾驶导航全知世界模型)

[08:28 ] 🕳 Attention Sinks in Diffusion Language Models(扩散语言模型中的注意力沉陷现象)

[09:04 ] 📄 olmOCR 2: Unit Test Rewards for Document OCR(olmOCR 2:基于单元测试奖励的文档OCR系统)

[09:42 ] 🧠 KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints(KORE:通过知识导向增强与约束为大模型持续注入知识)

<figure></figure>

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

2025.10.23 | 线性注意力显存降十倍;动态裁剪PPO稳提分

2025.10.23 | 线性注意力显存降十倍;动态裁剪PPO稳提分